perm filename V234.TEX[TEX,DEK]4 blob sn#378548 filedate 1978-09-06 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00015 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	\input acphdr % Section 3.4
C00004 00003	%folio 142 galley 1 (C) Addison-Wesley 1978	*
C00018 00004	%folio 144 galley 2 (C) Addison-Wesley 1978	*
C00027 00005	%folio 147 galley 3 (C) Addison-Wesley 1978	*
C00041 00006	%folio 151 galley 4 (C) Addison-Wesley 1978	*
C00052 00007	%folio 156 galley 5 (C) Addison-Wesley 1978	*
C00064 00008	%folio 161 galley 6 (C) Addison-Wesley 1978	*
C00081 00009	%folio 165 galley 7 (C) Addison-Wesley 1978	*
C00092 00010	%folio 173 galley 8 (C) Addison-Wesley 1978	*
C00102 00011	%folio 176 galley 9 (C) Addison-Wesley 1978	*
C00113 00012	%folio 178 galley 10 (C) Addison-Wesley 1978	*
C00123 00013	%folio 182 galley 11 (C) Addison-Wesley 1978	*
C00137 00014	%folio 186 galley 12 (C) Addison-Wesley 1978	*
C00149 00015	\vfill\eject
C00150 ENDMK
C⊗;
\input acphdr % Section 3.4
\runninglefthead{RANDOM NUMBERS} % chapter title
\titlepage\setcount00
\null
\vfill
\tenpoint
\ctrline{SECTION 3.4 of THE ART OF COMPUTER PROGRAMMING}
\ctrline{$\copyright$ 1978 Addison--Wesley Publishing Company, Inc.}
\vfill
\runningrighthead{OTHER TYPES OF RANDOM QUANTITIES}
\section{3.4}
\eject
\setcount0 113
%folio 142 galley 1 (C) Addison-Wesley 1978	*
\sectionbegin{3.4. OTHER TYPES OF RANDOM QUANTITIES}
W{\:cE HAVE NOW SEEN} how to make a computer generate
a sequence of numbers $U↓0$, $U↓1$, $U↓2$, $\ldots$ which behaves as
if each number were independently selected at random between
zero and one with the uniform distribution. Applications of
random numbers often call for other kinds of distributions, however;
for example, if we want to make a random choice from among $k$
alternatives, we want a random {\sl integer} between 1 and $k$.
If some simulation process calls for a random waiting time between
occurrences of independent events, a random number with the
``exponential distribution'' is desired. Sometimes we don't
even want random {\sl numbers\/}---we want a random permuation
(i.e., a random arrangement of $n$ objects) or a random combination
(i.e., a random choice of $k$ objects from a collection of $n$).

In principle, any of these other random quantities may
be obtained from the uniform deviates $U↓0$, $U↓1$, $U↓2$, $\ldotss\,
$. People have devised a number of important ``random tricks''
which may be used to perform these manipulations efficiently
on a computer, and a study of these techniques also gives some
insight into the proper use of random numbers in any Monte Carlo
application.

It is conceivable that someday somebody will invent a random-number
generator that produces one of these other random quantities
{\sl directly}, instead of getting it indirectly via the uniform
distribution. But except for the ``random bit'' generator described
in Section 3.2.2, no direct methods have so far proved to be
practical.

The discussion in the following section assumes the existence
of a random sequence of uniformly distributed real numbers between
zero and one. A new uniform deviate $U$ is generated whenever we
need it. These numbers are usually represented in a computer
word with the decimal point assumed at the left.

\runningrighthead{NUMERICAL DISTRIBUTIONS}
\section{3.4.1}
\sectionskip
\sectionbegin{3.4.1. Numerical Distributions}
This section summarizes the best techniques known
for producing numbers from various important distributions.
Many of the methods were originally suggested by John von Neumann
in the early 1950s, and they have gradually been improved
upon by other people, notably George Marsaglia, J. H. Ahrens,
and U. Dieter.

\subsectionbegin{A. Random choices from a finite set} The
simplest and most common type of distribution required in practice
is a random {\sl integer}. An integer between 0 and 7 can be
extracted from three bits of $U$ on a binary computer; in such
a case, these bits should be extracted from the {\sl most significant}
(left-hand) part of the computer word, since the least significant bits
produced by many random-number
generators are not sufficiently random.
(See the discussion in Section 3.2.1.1.)

In general, to get a random integer $X$ between 0 and $k - 1$,
we can {\sl multiply} by $k$, and let $X = \lfloor kU\rfloor
$. On \MIX, we would write
$$\vcenter{\mixtwo{
LDA⊗U\cr
MUL⊗K\cr}}\eqno(1)$$
and after these two instructions have been executed
the desired integer will appear in register A. If a random integer
between 1 and $k$ is desired, we add one to this result. [The
instruction ``{\tt INCA 1}'' would follow (1).]

This method gives each integer with nearly equal probability.
(There is a slight error because the computer word size is finite;
see exercise 2. The error is quite negligible if $k$ is small,
for example if $k/m$ < 1/10000.) In a more general situation
we might want to give different weights to different integers.
Suppose that the value $X = x↓1$ is to be obtained with probability
$p↓1$, and $X = x↓2$ with probability $p↓2$, $\ldotss $, $X = x↓k$
with probability $p↓k$. We can generate a uniform number $U$
and let
$$X=\left\{\vcenter{\halign{\lft{$#$}⊗\qquad if $#$\hfill\cr
x↓1,⊗0 ≤ U < p↓1;\cr
x↓2,⊗p↓1 ≤ U < p↓1 + p↓2;\cr
\noalign{\hjust{$\ldots$}}
x↓k,⊗p↓1 + p↓2 + \cdots + p↓{k-1} ≤ U < 1.\cr}}\right.\eqno(2)$$
(Note that $p↓1 + p↓2 + \cdots + p↓k = 1$.)

There is a ``best possible'' way to do the comparisons of $U$
against various values of $p↓1 + p↓2 + \cdots + p↓s$, as implied
in (2); this situation is discussed in Section 2.3.4.5. Special
cases can be handled by more efficient methods; for example,
to obtain one of the eleven values 2, 3, $\ldotss$, 12 with the
respective ``dice'' probabilities ${1\over 36}$, ${2\over 36}$, $\ldotss
$, ${6\over 36}$, $\ldotss$, ${2\over 36}$, ${1\over 36}$, we could
compute two independent random integers between 1 and 6 and
add them together.

However, none of the above techniques is really the fastest general
way to select $x↓1$, $\ldotss$, $x↓k$ with the correct probabilities.
An ingenious way to do the trick has been discovered by A. J. Walker
[{\sl Electronics Letters \bf10},8 (1974), 127--128; {\sl ACM Trans.\
Math.\ Software \bf3} (1976), 253--256]. Suppose we form $kU$ and consider
the integer part $K=\lfloor kU\rfloor$ and fraction part $V=(kU)\mod 1$
separately; for example, after the code (1) we will have $K$ in register A
and $V$ in register X. Then we can always obtain the desired distribution
by doing the operations
$$\hjust{if}\quad V<P↓K\quad\hjust{then}\quad X←x↓{K+1}\quad\hjust{otherwise}\quad
X←Y↓K,\eqno(3)$$
for some appropriate tables $(P↓0,\ldotss,P↓{k-1})$ and $(Y↓0,\ldotss,Y↓{k-1})$.
Exercise 7 shows how such tables can be computed in general.

On a binary computer it is usually helpful to assume that $k$ is a power of
2, so that multiplication can be replaced by shifting; this can be done without
loss of generality by introducing additional $x$'s that occur with probability
zero. For example, let's consider dice again; suppose we want $X=j$ to occur
with the following 16 probabilities:
$$\def\\{\hskip5pt}\def\¬{\over36}
\tabskip 0pt plus 100pt \halign to size{\rt{$#=\;$}\tabskip0pt
⊗\ctr{$#$}\\⊗\ctr{$#$}\\⊗\ctr{$#$}\\⊗\ctr{$#$}\\
⊗\ctr{$#$}\\⊗\ctr{$#$}\\⊗\ctr{$#$}\\⊗\ctr{$#$}\\
⊗\ctr{$#$}\\⊗\ctr{$#$}\\⊗\ctr{$#$}\\⊗\ctr{$#$}\\
⊗\ctr{$#$}\\⊗\ctr{$#$}\\⊗\ctr{$#$}\\⊗\ctr{$#$}\tabskip 0pt plus 100pt \cr
j⊗0⊗1⊗2⊗3⊗4⊗5⊗6⊗7⊗8⊗9⊗10⊗11⊗12⊗13⊗14⊗15\cr
\noalign{\penalty1000\vskip 4pt}
p↓j⊗0⊗0⊗1\¬⊗2\¬⊗3\¬⊗3\¬⊗4\¬⊗5\¬⊗6\¬⊗5\¬⊗4\¬⊗3\¬⊗2\¬⊗1\¬⊗0⊗0\cr
\noalign{\vskip 12pt plus 3pt minus 9pt}
\noalign{\hjust to size{We
can do this using (3), if $k=16$ and $x↓j=j$ for $0≤j<16$, and if the $P$
and $Y$ tables are set up as follows:}}
\noalign{\vskip 12pt plus 3pt minus 9pt}
P↓j⊗0⊗0⊗4\over9⊗8\over9⊗1⊗7\over9⊗1⊗1⊗1⊗7\over9⊗7\over9⊗8\over9⊗4\over9⊗0⊗0⊗0\cr
\noalign{\penalty1000\vskip 4pt}
Y↓j⊗5⊗9⊗7⊗4⊗\ast⊗6⊗\ast⊗\ast⊗\ast⊗8⊗4⊗7⊗10⊗6⊗7⊗8\cr}$$
(When $P↓j=1$, $Y↓j$ is not used.) For example, the value 7 occurs with probability
${1\over16}\cdot\biglp(1-P↓2)+P↓7+(1-P↓{11})+(1-P↓{14})\bigrp={6\over36}$
as required.
It is a peculiar way to throw dice, but the results are indistinguishable from
the real thing.

\subsectionbegin{B. General methods for continuous distributions}
The most general real-valued distribution may be expressed in
terms of its ``distribution function'' $F(x)$, which specifies
the probability that a random quantity $X$ will be less than
or equal to $x$;
$$F(x) = \hjust{probability that}(X ≤ x).\eqno(4)$$
This function always increases monotonically
from zero to one:
$$F(x↓1) ≤ F(x↓2),\quad\hjust{if }x↓1 ≤ x↓2;\qquad F(-∞)
= 0,\qquad F(+∞) = 1.\eqno(5)$$
Examples of distribution functions are given
in Section 3.3.1, Fig.\ 3. If $F(x)$ is continuous and strictly
increasing (so that $F(x↓1) < F(x↓2)$ when $x↓1 < x↓2$), it
takes on all values between zero and one, and there is an {\sl
inverse function} $F↑{-1}(y)$ such that, for $0 < y < 1$,
$$y = F(x)\qquad \hjust{if and only if}\qquad x = F↑{-1}(y).\eqno(6)$$
In general we can compute a random quantity $X$
with the continuous, strictly increasing distribution $F(x)$
by setting
$$X = F↑{-1}(U)\eqno(7)$$
where $U$ is uniform; for
the probability that $X ≤ x$ is the probability
that $F↑{-1}(U) ≤ x$, i.e., the probability that $U ≤ F(x)$,
and this is $F(x)$.

The problem now reduces to one of numerical analysis, namely
to find good methods for evaluating $F↑{-1}(U)$ to the desired
accuracy. Numerical analysis lies outside the scope of this seminumerical
book; yet there are a number of important shortcuts available to speed
up this general approach, and we will consider them here.
%folio 144 galley 2 (C) Addison-Wesley 1978	*
In the first place, if $X↓1$ is a random variable having
the distribution $F↓1(x)$ and if $X↓2$ is an independent random
variable with the distribution $F↓2(x)$, then
$$\eqalign{\max(X↓1, X↓2)\qquad⊗\hjust{has the distribution}\qquad
F↓1(x)F↓2(x),\cr\noalign{\vskip2pt}
\min(X↓1, X↓2)\qquad⊗\hjust{has the distribution}\qquad
F↓1(x) + F↓2(x) - F↓1(x)F↓2(x).\cr}\eqno(8)$$
(See exercise 4.) For example, the uniform
deviate $U$ has the distribution $F(x) = x$, for $0 ≤ x ≤ 1$;
if $U↓1$, $U↓2$, $\ldotss$, $U↓t$ are independent uniform deviates,
then $\max(U↓1, U↓2, \ldotss , U↓t)$ has the distribution function
$F(x) = x↑t$, for $0 < x ≤ 1$. This is the basis of the ``maximum
of $t$'' test given in Section 3.3.2. Note that the inverse
function in this case is $F↑{-1}(y) = \raise 3pt\hjust to 0pt{\hskip
3pt minus 20pt\:lt\hskip 0pt minus 10000pt}\sqrt{y}$. % t-th root
In the special case $t=2$, we see therefore that the two formulas
$$X = \sqrt U\qquad\hjust{and}\qquad X =\max(U↓1,U↓2)\eqno(9)$$
will give equivalent distributions to the random
variable $X$, although this is not obvious at first glance.
We need not take the square root of a uniform deviate.

The number of tricks like this is endless: {\sl any} algorithm which
employs random numbers as input will give a random quantity
with {\sl some} distribution as output. The problem is to find
general methods for constructing the algorithm, given the distribution
function of the output. Instead of discussing such methods in
purely abstract terms, we shall study how they can be applied
in important cases.

\subsectionbegin{C. The normal distribution} Perhaps
the most important nonuniform, continuous distribution is the
so-called {\sl normal distribution with mean zero and standard
deviation one:}
$$F(x) = {1\over \sqrt{2π}}
\int ↑{x}↓{-∞} e↑{-t↑2/2}\,dt.\eqno(10)$$
The significance of this distribution was indicated
in Section 1.2.10. Note that the inverse function $F↑{-1}$ is
not especially easy to compute; but we shall see that several
other techniques are available.

\yyskip \noindent({\:g1\/}) {\sl The polar method}, due to G. E.
P. Box, M. E. Muller, and G. Marsaglia.\xskip $\biglp$See {\sl Annals Math.\
Stat.\ \bf28} (1958), 610; and Boeing Scientific Res.\ Lab.\ report
D1-82-0203 (1962).$\bigrp$

\algbegin Algorithm P (Polar method for normal
deviates). This algorithm calculates two independent normally
distributed variables, $X↓1$ and $X↓2$.

\algstep P1. [Get uniform variables.] Generate
two independent random variables, $U↓1$, $U↓2$, uniformly distributed
between zero and one. Set $V↓1 ← 2U↓1 - 1$, $V↓2 ← 2U↓2 - 1$. (Now
$V↓1$ and $V↓2$ are uniformly distributed between $-1$ and $+1$.
On most computers it will be preferable to have $V↓1$ and $V↓2$
represented in floating-point form at this point.)

\algstep P2. [Compute $S$.] Set $S ← V↓1↑2 + V↓2↑2$.

\algstep P3. [Is $S ≥ 1$?] If $S ≥ 1$, return to step P1. (Steps
P1 through P3 are executed 1.27 times on the average, with a
standard deviation of 0.587; see exercise 6.)

\algstep P4. [Compute $X↓1, X↓2$.] Set $X↓1$, $X↓2$ according
to the following two equations:
$$X↓1 = V↓1\sqrt{-2 \ln S\over S},\qquad X↓2 = V↓2\sqrt{-2 \ln
S\over S}.\eqno(11)$$
These are the normally distributed
variables desired.\quad\blackslug

\yyskip To prove the validity of this method, we
use elementary analytic geometry and calculus: If $S < 1$ in
step P3, the point in the plane with Cartesian coordinates $(V↓1,
V↓2)$ is a {\sl random point uniformly distributed inside the
unit circle.} Transforming to polar coordinates $V↓1 = R \cos
\Theta$, $V↓2 = R \sin \Theta$, we find $S = R↑2$, $X↓1 =
\sqrt{-2 \ln S} \cos \Theta$, $X↓2 = \sqrt{-2
\ln S} \sin \Theta$. Using also the polar coordinates $X↓1 = R↑\prime
\cos \Theta↑\prime$, $X↓2 = R↑\prime \sin \Theta↑\prime$, we find that
$\Theta↑\prime = \Theta$ and $R↑\prime = \sqrt{-2 \ln S}$.
It is clear that $R↑\prime$ and $\Theta↑\prime$ are independent, since
$R$ and $\Theta$ are independent inside the unit circle. Also, $\Theta↑\prime$
is uniformly distributed between 0 and $2π$; and the probability
that $R↑\prime ≤ r$ is the probability that $-2\ln S ≤ r↑2$,
i.e., the probability that $S ≥ e↑{-r↑2/2}$. This
equals $1 - e↑{-r↑2/2}$, since $S = R↑2$ is uniformly
distributed between zero and one. The probability that $R↑\prime$
lies between $r$ and $r + dr$ is therefore the derivative of
$1 - e↑{-r↑2/2}$, namely, $re↑{-r↑2/2}\,dr$.
Similarly, the probability that $\Theta↑\prime$ lies between $\theta$
and $\theta + d\theta$ is $(1/2π)\,d\theta $. The joint probability
that $X↓1 ≤ x↓1$ and that $X↓2 ≤ x↓2$ now can be computed, it is
$$\eqalign{\int↓{\{(r,\theta )\,|\, r \cos \theta ≤x↓1,\,r \sin \theta
≤x↓2\}}\?⊗ {1\over 2π} e↑{-r↑2/2}\,r\,dr\,d\theta\cr⊗ = {1\over
2π} \int ↓{\{(x,y)\,|\,x≤x↓1,\,y≤x↓2\}} e↑{-(x↑2+y↑2)/2}
\,dx\,dy\cr
\noalign{\vskip 2pt}
⊗= \bigglp\sqrt{1\over 2π} \int ↑{x↓1}↓{-∞} e↑{-x↑2/2}
\,dx\biggrp\bigglp\sqrt{1\over 2π} \int ↑{x↓2}↓{-∞} e↑{-y↑2/2}
\,dy\biggrp.\cr}$$
This proves that $X↓1$ and $X↓2$ are independent and
normally distributed, as desired.

\yyskip \noindent({\:g2\/}) {\sl The rectangle-wedge-tail method}, introduced
by G. Marsaglia.\xskip In this method we use the distribution
$$F(x) = \sqrt{2\over π} \int ↑{x}↓{0} e↑{-t↑2/2}\,dt,\qquad
x ≥ 0,\eqno(12)$$
so that $F(x)$ gives the distribution of the
{\sl absolute value} of a normal deviate. After $X$ has
been computed according to this distribution, we will attach a random
sign to its value, and this will make it a true normal deviate.
%folio 147 galley 3 (C) Addison-Wesley 1978	*
The rectangle-wedge-tail approach is based on several
important general techniques which we shall explore as we develop
the algorithm. The first key idea is to regard $F(x)$ as a {\sl
mixture} of several other functions, namely to write
$$F(x) = p↓1F↓1(x) + p↓2F↓2(x) + \cdots + p↓nF↓n(x),\eqno(13)$$
where $F↓1$, $F↓2$, $\ldotss$, $F↓n$ are appropriate
distributions and $p↓1$, $p↓2$, $\ldotss$, $p↓n$ are nonnegative probabilities
which sum to 1. If we generate a random variable $X$ by choosing
distribution $F↓j$ with probability $p↓j$, it is easy to see
that $X$ will have distribution $F$ overall. Some of the distributions
$F↓j(x)$ may be rather difficult to handle, even harder than
$F$ itself, but we can usually arrange things so that the probability
$p↓j$ is very small in this case. Most of the distributions
$F↓j(x)$ will be quite easy to accommodate, since they will
be trivial modifications of the uniform distribution. The resulting
method yields an extremely efficient program, since its {\sl
average} running time is very small.

It is easier to understand the method if we work with the {\sl
derivatives} of the distributions instead of the distributions
themselves. Let
$$f(x) = F↑\prime (x),\qquad f↓j(x) = {F↓j}↑\prime(x);$$
these are called the {\sl density} functions of
the probability distributions. Equation (13) becomes
$$f(x) = p↓1f↓1(x) + p↓2f↓2(x) + \cdots + p↓nf↓n(x).\eqno(14)$$
Each $f↓j(x)$ is $≥0$, and the total area under
the graph of $f↓j(x)$ is 1; so there is a convenient graphical way
to display the relation (14): The area under $f(x)$ is divided
into $n$ parts, with the part corresponding to $f↓j(x)$ having
area $p↓j$. See Fig.\ 9, which illustrates the situation in the
case of interest to us here, with $f(x) = F↑\prime (x) = 
\sqrt{2/π}\,e↑{-x↑2/2}$; the area under this
curve has been divided into $n = 31$ parts. There are 15
rectangles, which represent $p↓1f↓1(x)$, $\ldotss$, $p↓{15}f↓{15}(x)$;
there are 15 wedge-shaped pieces, which represent $p↓{16}f↓{16}(x)$,
$\ldotss$, $p↓{30}f↓{30}(x)$; and the remaining part $p↓{31}f↓{31}(x)$
is the ``tail,'' namely the entire graph of $f(x)$ for $x ≥
3$.

\topinsert{\vskip 55mm\baselineskip11pt
\hjust to size{\caption Fig.\ 9. The density function
divided into 31 parts. The area of each part represents the
average number of times a random number with that density is
to be computed.}}

The rectangular parts $f↓1(x)$, $\ldotss
$, $f↓{15}(x)$ represent {\sl uniform distributions.} For example,
$f↓3(x)$ represents a random variable uniformly distributed
between ${2\over 5}$ and ${3\over 5}$. The altitude of $p↓jf↓j(x)$ is
$f(j/5)$, hence the area of the $j$th rectangle is
$$\textstyle p↓j={1\over5}f(j/5)=\dispstyle\sqrt{2\over15π}\,e↑{-j↑2/50},\qquad
\hjust{for }1≤j≤15.\eqno(15)$$
In order to generate such rectangular portions of the distribution, we simply
compute$$\textstyle X={1\over5}U+S,\eqno(16)$$
where $U$ is uniform and $S$ takes the value $(j-1)/5$ with probability $p↓j$.
Since $p↓1+\cdots+p↓{15}=.9183$, we can use simple uniform deviates like this
about 92 percent of the time.

%folio 151 galley 4 (C) Addison-Wesley 1978	*
In the remaining 8 percent, we will usually have to generate one of the
wedge-shaped distributions $F↓{16}$, $\ldotss$, $F↓{30}$. Typical examples of
what we need to do are shown in Fig.\ 10. When $x<1$, the curved
part is concave downward, and when $x > 1$ it is concave upward,
but in each case the curved part is reasonably close to a straight
line, and it can be enclosed in two parallel lines as shown.

To handle these wedge-shaped distributions,
we will rely on yet another general technique, von Neumann's
so-called {\sl rejection method} for obtaining a complicated
density from another one that ``encloses'' it. The polar method
described above is a simple example of such an approach: Steps
P1--P3 obtain a random point inside the unit circle by first
generating a random point in a larger square, rejecting it and starting
over again if the point was outside the circle.

The general rejection method is even more powerful than this.
To generate a random variable $X$ with density $f$, let $g$
be another probability density function such that $$f(t) ≤ cg(t)\eqno(17)$$
for all $t$, where $c$ is a constant. Now generate $X$ according
to density $g$, and also generate an independent uniform deviate
$U$. If $U ≥ f(X)/cg(X)$, reject $X$ and start again with another
$X$ and $U$. When the condition $U < f(X)/cg(X)$ finally occurs,
the resulting $X$ will have density $f$ as desired.\xskip
[{\sl Proof:} $X ≤
x$ will occur with probability $p(x) = \int ↑{x}↓{0} \biglp
g(t)\,dt \cdot f(t)/cg(t)\bigrp+qp(x)$, where $q =
\int ↑{1}↓{0} \biglp g(t)\,dt \cdot (1 - f(t)/cg(t))\bigrp = 1
- 1/c$ is the probability of rejection; hence $p(x) = \int ↑{x}↓{0}
f(t)\,dt$.]

The rejection technique is most efficient when $c$ is small,
since there will be $c$ iterations on the average before a value
is accepted. (See exercise 6.) In some cases $f(x)/cg(x)$ is
always 0 or 1; then $U$ need not be generated. In other cases if $f(x)/cg(x)$ is
hard to compute, we may know some bounding functions $$r(x) ≤
f(x)/cg(x) ≤ s(x)\eqno(18)$$ that are much simpler, and the exact value
of $f(x)/cg(x)$ need not be calculated unless $r(x) ≤ U < s(x)$.
The following algorithm solves the ``wedge'' problem by developing
the rejection method still further.

\algbegin Algorithm L (Nearly linear densities).
This algorithm may be used to generate a random variable $X$
for any distribution whose density $f(x)$ satisfies the following
conditions (cf.\ Fig.\ 10):
$$\eqalign{f(x)⊗=0,\qquad\qquad\hjust{for $x<s$ and for $x>s+h$;}\cr
\noalign{\vskip 2pt}
\quad a-b(x-s)/h≤f(x)⊗≤b-b(x-s)/h,\qquad\hjust{for }s≤x≤s+h.\cr}\eqno(19)$$

\algstep L1. [Get $U ≤ V$.] Generate two independent
random variables $U$, $V$, uniformly distributed between zero
and one. If $U > V$, exchange $U ↔ V$.

\algstep L2. [Easy case?] If $V ≤ a/b$, go to L4.

\algstep L3. [Try again?] If $V > U + (1/b)f(s + hU)$, go back
to step L1. (If $a/b$ is close to 1, this step of the algorithm
will not be necessary very often.)

\algstep L4. [Compute $X$.] Set $X ← s + hU$.\quad\blackslug

\yyskip When step L4 is reached, the point $(U,
V)$ is a random point in the area shaded in Fig.\ 11, namely,
$0 ≤ U ≤ V ≤ U + (1/b)f(s + hU)$. Conditions (19) ensure that
$${a\over b} ≤ U + {1\over b} f(s + hU) ≤ 1.$$
Now the probability that $X ≤ s + hx$, for $0 ≤
x ≤ 1$, is the ratio of area to the left of the vertical line
$U = x$ in Fig.\ 11 to the total area, namely,
$$\int ↑{x}↓{0} {1\over b} f(s + hu)\,du\;\vcenter{\hjust{\:@\char'36}}
\int ↑{1}↓{0}
{1\over b} f(s + hu)\,du = \int ↑{s+hx}↓{s} f(v)\,dv;$$
therefore $X$ has the correct distribution.

\topinsert{\vskip 45mm
\ctrline{\caption Fig.\ 11. Region of ``acceptance'' in Algorithm L.}}

With appropriate constants $a↓j$,
$b↓j$, $s↓j$, Algorithm L will take care of the wedge-shaped densities
$f↓{j+15}$ of Fig.\ 9, for $1 ≤ j ≤ 15$. The final distribution,
$F↓{31}$, needs to be treated only about one time in 370; it
is used whenever a result $X ≥ 3$ is to be computed. Exercise 11
shows that a standard rejection scheme can be used for this
``tail''; hence we are ready to consider the procedure in its entirety:

\algbegin Algorithm M (Rectangle-wedge-tail method for normal
deviates). This algorithm uses several auxiliary tables
$(P↓0,\ldotss,P↓{31})$, $(Q↓1,\ldotss,Q↓{15})$, $(Y↓0,\ldotss,Y↓{31})$,
$(Z↓0,\ldotss,Z↓{31})$, $(S↓1,\ldotss,S↓{16})$, $(D↓{16},\ldotss,D↓{30})$,
$(E↓{16},\ldotss,E↓{30})$, constructed as explained in exercise 10;
examples appear in Table 1. 
We assume that a binary computer is being used;
a similar procedure could be worked out for decimal machines.

\algstep M1. [Get $U$.]
Generate a uniform random number $U = (.b↓0b↓1b↓2 \ldotsm b↓t)↓2$.
(Here the $b$'s are the bits in the binary representation of
$U$. For reasonable accuracy, $t$ should be at least 24.) Set
$\psi ← b↓0$. (Later, $\psi$ will be used to determine the sign
of the result.)

\algstep M2. [Rectangle?] Set $j←(b↓1b↓2b↓3b↓4b↓5)↓2$, a binary number
determined by the leading bits of $U$, and set $f←(.b↓6b↓7\ldotsm b↓t)↓2$,
the fraction determined by the remaining bits. If $f≥P↓j$, set
$X←Y↓j+fZ↓j$ and go to M9. Otherwise if $j≤15$ (i.e., $b↓1=0$), set
$X←S↓j+fQ↓j$ and go to M9. $\biglp$This
is an adaptation of Walker's method (3).$\bigrp$
%folio 156 galley 5 (C) Addison-Wesley 1978	*
\algstep M3. [Wedge or tail?] (Now $15≤j≤31$, and each particular value
$j$ occurs with probability $p↓j$.) If $j=31$, go to M7.

\algstep M4. [Get $U ≤ V$.] Generate two new uniform
deviates, $U$, $V$; if $U > V$, exchange $U ↔ V$. (We are now
performing Algorithm L.) Set $X ← S↓{j-15} + {1\over 5}U$.

\algstep M5. [Easy case?] If $V ≤ D↓j$, go to M10.

\algstep M6. [Another try?] If $V > U + E↓j (e↑{(S↓{j-14}↑2-X↑2)/2}
- 1)$, go back to step M4; otherwise
go to M9. (This step is executed with low probability.)

\topinsert{\vskip 80mm
\ctrline{\caption Fig.\ 12. The ``rectangle-wedge-tail'' algorithm for
generating normal deviates.}}

\algstep M7. [Get supertail deviate.] Generate two
new independent uniform deviates, $U$, $V$, and set $X ←
\sqrt{9 - 2 \ln V}$.

\algstep M8. [Reject?] If $UX ≥ 3$, go back to stop
M8. (This will occur only about one-twelfth of the time we reach step M8.)

\algstep M9. [Attach sign.] If $\psi = 1$, set $X
← -X$.\quad\blackslug

\yyskip This algorithm is a very
pretty example of mathematical theory intimately interwoven
with programming ingenuity---a fine illustration of the
art of computer programming! Only steps M1, M2, and M9 need to be
performed most of the time, and the other steps aren't terribly
slow either. The first publications of the rectangle-wedge-tail method
were by G. Marsaglia, {\sl Annals Math.\ Stat.\ \bf32} (1961), 894--899;
G. Marsaglia, M. D. MacLaren, and T. A. Bray, {\sl CACM \bf7} (1964), 4--10.
Further refinements of Algorithm M have been developed by G. Marsaglia, K.
Ananthanarayanan, and N. J. Paul, {\sl Inf.\ Proc.\ Letters \bf5} (1976),
27--30.

\topinsert{\tablehead{Table 1}
\ctrline{EXAMPLE OF TABLES USED WITH ALGORITHM M*}
\vskip 4pt \hrule
\eightpoint\tabskip 0pt plus 10pt
\halign to size{\rt{#}
⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{$#$}⊗\rt{$#$}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}\cr
\noalign{\vskip 2pt}
$j$\hfill⊗$P↓j$\hfill⊗$P↓{j+16}$\hfill⊗$Q↓j$\hfill⊗Y↓j\hfill⊗Y↓{j+16}\hfill
⊗$Z↓j$\hfill⊗$Z↓{j+16}$\hfill⊗$S↓{j+1}$\hfill⊗$D↓{j+15}$\hfill⊗$E↓{j+15}$\hfill\cr
\noalign{\vskip 2pt\hrule\vskip 2pt}
0⊗.000⊗.067⊗⊗0.00⊗0.59⊗0.20⊗0.21⊗0.0\cr
1⊗.849⊗.161⊗.236⊗-0.92⊗0.96⊗1.32⊗0.24⊗0.2⊗.505⊗25.00\cr
2⊗.970⊗.236⊗.206⊗-5.86⊗-0.06⊗6.66⊗0.26⊗0.4⊗.773⊗12.50\cr
3⊗.855⊗.285⊗.234⊗-0.58⊗0.12⊗1.38⊗0.28⊗0.6⊗.876⊗8.33\cr
4⊗.994⊗.308⊗.201⊗-33.13⊗1.31⊗34.93⊗0.29⊗0.8⊗.939⊗6.25\cr
5⊗.995⊗.304⊗.201⊗-39.55⊗0.31⊗41.35⊗0.29⊗1.0⊗.986⊗5.00\cr
6⊗.933⊗.280⊗.214⊗-2.57⊗1.12⊗2.97⊗0.28⊗1.2⊗.995⊗4.06\cr
7⊗.923⊗.241⊗.217⊗-1.61⊗0.54⊗2.61⊗0.26⊗1.4⊗.987⊗3.37\cr
8⊗.727⊗.197⊗.275⊗0.67⊗0.75⊗0.73⊗0.25⊗1.6⊗.979⊗2.86\cr
9⊗1.000⊗.152⊗.200⊗0.00⊗0.56⊗0.00⊗0.24⊗1.8⊗.972⊗2.47\cr
10⊗.691⊗.112⊗.289⊗0.35⊗0.17⊗0.65⊗0.23⊗2.0⊗.966⊗2.16\cr
11⊗.454⊗.079⊗.440⊗-0.17⊗0.38⊗0.37⊗0.22⊗2.2⊗.960⊗1.92\cr
12⊗.287⊗.052⊗.698⊗0.92⊗-0.01⊗0.28⊗0.21⊗2.4⊗.954⊗1.71\cr
13⊗.174⊗.033⊗1.150⊗0.36⊗0.39⊗0.24⊗0.21⊗2.6⊗.948⊗1.54\cr
14⊗.101⊗.020⊗1.974⊗-0.02⊗0.20⊗0.22⊗0.20⊗2.8⊗.942⊗1.40\cr
15⊗.057⊗.086⊗3.526⊗0.19⊗0.78⊗0.21⊗0.22⊗3.0⊗.936⊗1.27\cr}
\vskip 2pt\hrule\vskip 5pt
\hjust to size{*In practice, this data would be given with much greater
precision; the table shows only enough figures so that an interested reader
may test his own algorithm for computing the values more accurately.}}

\yyskip \noindent({\:g3\/}) {\sl The odd-even method}, due to G. E. Forsythe.\xskip
An amazingly simple technique for generating random deviates with a density
of the general exponential form
$$f(x) = Ce↑{-h(x)},\quad\hjust{for }a ≤ x < b,\qquad f(x) = 0\quad
\hjust{otherwise},\eqno(20)$$
when
$$0 ≤ h(x) ≤ 1\qquad\hjust{for }a ≤ x < b,\eqno(21)$$
was discovered by John von Neumann and G. E.
Forsythe about 1950. The idea is based on the rejection method
described earlier, letting $g(x)$ be the uniform distribution
on $[a,b)$: We set $X ← a + (b - a)U$, where $U$ is a uniform
deviate, and then we want to accept $X$ with probability $e↑{-h(X)}$.
The latter operation could be done by testing $e↑{-h(X)}$ vs.\
$V$, or $h(X)$ vs. $-\ln V$, when $V$ is another uniform deviate,
but the job can be done without applying any transcendental
functions in the following interesting way. Set $V↓0 ← h(X)$,
then generate uniform deviates $V↓1$, $V↓2$, $\ldots$ until finding
some $K ≥ 1$ with $V↓{K-1} < V↓K$. For fixed $V$ and $k$, the
probability that $h(X) ≥ V↓1 ≥ \cdots ≥ V↓k$ is $1/k!$ times
the probability that $\max (V↓1, \ldotss , V↓k)≤ h(X)$,
namely $h(X)↑k/k!$; hence the probability that $K = k$ is $h(X)↑{k-1}/(k
- 1)! - h(X)↑k/k!$, and the probability that $K$ is odd is
$$\sum ↓{\scriptstyle k\,\lower1.5pt\null\hjust{\:d odd}\atop\scriptstyle k≥1}
\left({h(X)↑{k-1}\over (k - 1)!} - {h(X)↑k\over
k!}\right) = e↑{-h(X)}.$$
Therefore we reject $X$ and try again if $K$ is
even; we accept $X$ as a random variable with density (20) if
$K$ is odd. Note that we usually won't have to generate many
$V'$s in order to determine $K$, since the average value of
$K$ (given $X$) is $\sum ↓{k≥0}\hjust{probability that}(K > k) 
= \sum ↓{k≥0} h(X)↑k/k!
= e↑{h(X)} ≤ e$.

\vskip 2pt
Forsythe realized some years later that this approach leads
to an efficient method for calculating normal deviates, without
the need for any auxiliary routines to calculate square roots
or logarithms as in Algorithms P and M\null. His procedure, with
an improved choice of intervals $[a,b)$ due to J. H. Ahrens
and U. Dieter, can be summarized as follows.

\algbegin Algorithm F (Odd-even method for normal
deviates). This algorithm generates normal deviates
on a binary computer, assuming approximately $t + 1$ bits of
accuracy. The algorithm requires
a table of values $d↓j = a↓j - a↓{j - 1}$, for $1 ≤ j ≤ t + 1$,
where $a↓j$ is defined by the relation
$$\sqrt{2\over π} \int ↑{∞}↓{a↓j} e↑{-x↑2/2}\,dx = {1\over2↑j}.\eqno(23)$$

\algstep F1. [Get $U$.] Generate a uniform
random number $U = (.b↓0b↓1 \ldotsm b↓t)↓2$, where $b↓0$, $\ldotss$,
$b↓t$ denote the bits in binary notation. Set $\psi ← b↓0$, $j
← 1$, and $a ← 0$.

\algstep F2. [Find first zero $b↓j$.] If $b↓j = 1$, set $a ←
a + d↓j$, $j ← j + 1$, and {\sl repeat} this step. (If $j = t
+ 1$, treat $b↓j$ as zero.)

\algstep F3. [Generate candidate.] $\biglp$Now $a = a↓{j-1}$, and the
current value of $j$ occurs with probability $\approx 1/2↑j$.
We will generate $X$ in the range $[a↓{j-1}, a↓j)$, using the
rejection method described above, with $h(x) = x↑2/2 -a↑2/2 = y↑2/2
+ ay$ where $y = x - a$. Exercise 12 proves that $h(x) ≤ 1$
as required in (21).$\bigrp$\xskip Set $Y ← d↓j$ times $(.b↓{j+1} \ldotsm b↓t)↓2$
and $V ← ({1\over 2}Y + a)Y$. (Since the average value of $j$
is 2, there will usually be enough significant bits in $(.b↓{j+1}
\ldotsm b↓t)↓2$ to provide decent accuracy. The calculations are
readily done in fixed point arithmetic.)

\algstep F4. [Reject?] Generate a uniform deviate $U$.
If $V < U$, go on to step F5. Otherwise set $V$ to a new uniform
deviate; and if now $U < V$ (i.e., if $K$ is even in the discussion
above), go back to F3, otherwise repeat step F4.

\algstep F5. [Return $X$.] Set $X←a+Y$. If $\psi=1$, set $X←-X$.\quad\blackslug
%folio 161 galley 6 (C) Addison-Wesley 1978	*
\yyskip Values of $d↓j$ for $1 ≤ j ≤ 47$ appear in a paper by
Ahrens and Dieter, {\sl Math.\ Comp.\ \bf 27} (1973), 927--937;
their paper discusses refinements of the algorithm which improve
its speed at the expense of more tables. Algorithm F is attractive
since it is almost as fast as Algorithm M and it is easier
to implement. The average number of uniform deviates required
per normal deviate is 2.53947; R. P. Brent [{\sl CACM \bf 17} (1974),
704--705] has shown how to reduce this number to 1.37446 at
the expense of two subtractions and one division per uniform
deviate saved.

\yyskip\noindent ({\:g4\/}) {\sl Ratios of uniform deviates.}\xskip
There is yet another good way to generate normal deviates, discovered
by A. J. Kinderman and J. F. Monahan in 1976. Their idea is to
generate a random point $(U,V)$ in the region defined by
$$0<u≤1,\qquad-2u\sqrt{\ln(1/u)}≤v≤2u\sqrt{\ln(1/u)},\eqno(24)$$
and then to output the ratio $X←V/U$. The shaded area of Fig.\ 13
is the magic region (24) which makes this all work.  Before we
study the associated theory, let us first state the algorithm
so that its comparative efficiency and simplicity are manifest:

\algbegin Algorithm R (Ratio method for normal deviates). This algorithm
generates normal deviates $X$.

\algstep R1. [Get $U,V$.] Generate two independent uniform deviates $U$ and
$V$, where $U$ is nonzero, and set $X←\sqrt{8/e}\,\left(V-{1\over2}\right)/U$.
(Now $X$ is the ratio of the coordinates $\biglp U,\sqrt{8/e}\,
\left(V-{1\over2}\right)\bigrp$
of a random point in the rectangle which encloses the shaded region
in Fig.\ 13. We will accept $X$ if the corresponding point actually lies
``in the shade,'' otherwise we will try again.)

\topinsert{\vjust to 100mm{\vfill\baselineskip 11pt
\jpar1000
\moveright 21pc\hjust to 6.8pc{\caption Fig.\ 13. Region of
``acceptance'' in the ratio - of - uniforms method for normal
deviates. Lengths of lines with coordinate ratio $x$
have the normal distribution.}\vskip 20pt}}

\algstep R2. [Optional upper bound test.] If $X↑2≤5-4e↑{1/4}U$, output $X$
and terminate the algorithm. (This step can be omitted if desired; it tests
whether or not the selected point is in the interior region of Fig.\ 13,
making it unnecessary to calculate a logarithm.)

\algstep R3. [Optional lower bound test.] If $X↑2≥4e↑{-1.35}/U+1.4$, go back
to R1. (This step also could be omitted; it tests whether or not the selected
point is outside the exterior region of Fig.\ 13, making it unnecessary to
calculate a logarithm.)

\algstep R4. [Final test.] If $X↑2≤-4/\!\ln U$, output $X$ and terminate the
algorithm. Otherwise go back to R1.\quad\blackslug

\yyskip Exercises 20 and 21 work out the timing analysis; four different
algorithms are analyzed, since steps R2 and R3 can be included or omitted
depending on one's preference.  The following table shows how many times
each step will be performed, on the average, depending on which of the
optional tests is applied:
$$\vcenter{\halign{\ctr{#}\quad⊗\ctr{#}\quad⊗\ctr{#}\quad⊗\ctr{#}\quad⊗\ctr{#}\cr
Step⊗Neither⊗R2 only⊗R3 only⊗Both\cr
\noalign{\vskip 2pt}
R1⊗1.369⊗1.369⊗1.369⊗1.369\cr
R2⊗0⊗1.369⊗0⊗1.369\cr
R3⊗0⊗0⊗1.369⊗0.467\cr
R4⊗1.369⊗0.467⊗1.134⊗0.232\cr}}\eqno(25)$$
Thus it pays to omit the optional tests if there is a very fast logarithm
operation, but if the log routine is rather slow it pays to include them.

But why does it work? One reason is that we can calculate the probability
that $X≤x$, and it turns out to be the correct value (10).  But such a
calculation isn't very easy unless one happens to hit on the right ``trick,''
and anyway it's better to understand how the algorithm might have been
discovered in the first place. Kinderman and Monahan derived it by working
out a theory that can be used with any well-behaved density function $f(x)$
[cf.\ {\sl ACM Trans.\ Math.\ Software \bf3} (1977), 257--260].

In general, suppose that a point $(U,V)$ has been generated
uniformly over the region
of the $(u,v)$-plane defined by
$$u>0,\qquad u↑2≤g(v/u)\eqno(26)$$
for some nonnegative integrable function $g$. If we set $X←V/U$, the
probability that $X≤x$ can be calculated by integrating $du\,dv$ over the
region defined by the two relations in (26) plus the auxiliary condition
$v/u≤x$, then dividing by the same integral without this extra condition.
Letting $v=tu$ so that $dv=u\,dt$, the integral becomes
$$\lower12pt\null % this puts extra space below the formula
\int↓{-∞}↑x\,dt\int↓0↑{\sqrt{g(t)}}u\,du={1\over2}\int↑x↓{-∞}g(t)\,dt.$$
Hence the probability that $X≤x$ is
$$\int↓{-∞}↑xg(t)\,dt\;\vcenter{\hjust{\:@\char'36}}\int↓{-∞}↑{+∞}
g(t)\,dt.\eqno(27)$$

The normal distribution comes out when $g(t)=e↑{-t↑2/2}$; and the condition
$u↑2≤g(v/u)$ simplifies in this case to $(v/u)↑2≤-4\ln u$. It is easy to see
that the set of all $(u,v)$ satisfying this relation is entirely contained
in the rectangle of Fig.\ 13.

The bounds in steps R2 and R3 define interior and exterior regions with simpler
boundary equations. The well-known inequality $$e↑x≥1+x,$$ which holds
for all real numbers $x$, can be used to show that
$$1+\ln c-cu\;≤\;-\ln u\;≤\;1/cu-1+\ln c\eqno(28)$$
for any constant $c>0$. Exercise 21 proves that $c=e↑{1/4}$ is the best possible
constant to use in step R2. The situation is more complicated in step R3,
and there doesn't seem to be a simple expression for the optimum $c$ in that
case, but computational experiments show that the best value for R3 is approximately
$e↑{1.35}$. The approximating curves (28) are tangent to the true boundary
when $u=1/c$.

It is possible to obtain a faster method by partitioning the region into
subregions, most of which can be handled more quickly. Of course, this
means that auxiliary tables will be needed, as in Algorithms M and F.

\yyskip \noindent({\:g5\/}) {\sl Variations of the normal distribution}.\xskip
So far we have considered the normal distribution with mean
zero and standard deviation one. If $X$ has this distribution,
then
$$Y = \mu + \sigma X\eqno(29)$$
has the normal distribution with mean $\mu$ and
standard deviation $\sigma $. Furthermore, if $X↓1$ and $X↓2$
are independent normal deviates with mean zero and standard
deviation one, and if
$$Y↓1 = \mu ↓1 + \sigma ↓1X↓1,\qquad Y↓2 = \mu ↓2 + \sigma
↓2(\rho X↓1 + \sqrt{1 - \rho ↑2}\,X↓2),\eqno(30)$$
then $Y↓1$ and $Y↓2$ are {\sl dependent} random
variables, normally distributed with means $\mu ↓1$, $\mu ↓2$ and
standard deviations $\sigma ↓1$, $\sigma ↓2$, and with correlation
coefficient $\rho $. (For a generalization to $n$ variables,
see exercise 13.)

\subsectionbegin{D. The exponential distribution}
After uniform deviates and normal deviates, the next most important
random quantity is an {\sl exponential deviate.} Such numbers
occur in ``arrival time'' situations; for example, if a radioactive
substance emits alpha particles at a rate so that one particle
is emitted every $\mu$ seconds on the average, then the time
between two successive emissions has the exponential distribution
with mean $\mu $. This distribution is defined by the formula
$$F(x) = 1 - e↑{-x/\mu},\qquad x ≥ 0.\eqno(31)$$

\yyskip \noindent({\:g1\/}) {\sl Logarithm method.}\xskip Clearly,
if $y = F(x) = 1 - e↑{-x/\mu}$, then $x = F↑{-1}(y) = -\mu \ln(1 -
y)$. Therefore by Eq.\ (7), $-\mu \ln(1 - U)$ has the exponential
distribution. Since $1 - U$ is uniformly distributed when $U$
is, we conclude that
$$X = -\mu \ln U\eqno(32)$$
is exponentially distributed with mean $\mu$.
(The case $U = 0$ must be avoided.)

\yyskip \noindent({\:g2\/}) {\sl Random minimization method}.\xskip
We saw in Algorithm F that there are simple and fast alternatives
to calculating the logarithm of a uniform deviate. The following
especially efficient approach has been developed by G. Marsaglia,
M. Sibuya, and J. H. Ahrens.

\algbegin Algorithm S (Exponential distribution
with mean {\:g\char'26}). This algorithm produces exponential
deviates on a binary computer, using uniform deviates with $t$-bit
accuracy. The constants
$$Q[k] = {\ln 2\over 1!} + {(\ln 2)↑2\over 2!} + \cdots
+ {(\ln 2)↑k\over k!},\qquad k ≥ 1,\eqno(33)$$
should be precomputed, extending until $Q[k]
> 1 - 2↑{1-t}$.

\algstep S1. [Get $U$ and shift.] Generate
a uniform random binary fraction $U = (.b↓1b↓2 \ldotsm b↓t)↓2$; locate
the first zero bit $b↓j$, and shift off the leading $j$ bits,
setting $U ← (.b↓{j+1} \ldotsm b↓t)↓2$. (As in Algorithm F\null, the average
value of $j$ is 2.)

\algstep S2. [Immediate acceptance?] If $U < \ln 2$, set
$X ← \mu (j \ln 2 + U)$ and terminate the algorithm. (Note
that $Q[1] = \ln 2$.)

\algstep S3. [Minimize.] Find the least $k ≥ 2$ such
that $U < Q[k]$. Generate $k$ new uniform deviates $U↓1$, $\ldotss
$, $U↓k$ and set $V ← \min(U↓1, \ldotss , U↓k)$.

\algstep S4. [Deliver the answer.] Set $X ← \mu (j
+ V)\ln 2$.\quad\blackslug

\yyskip Alternative ways to generate exponential deviates (e.g., a ratio
of uniforms as in Algorithm R) might also be used.

\subsectionbegin{E. Other continuous distributions}
Let us now consider briefly how to handle some other distributions
which arise reasonably often in practice.

\penalty1000\vskip 6pt plus 12pt minus 4pt
\noindent({\:g1\/}) {\sl The gamma distribution} of order $a >
0$ is defined by
$$F(x) = {1\over \Gamma (a)} \int ↑{x}↓{0} t↑{a-1}e↑{-t}\,dt,\qquad
x ≥ 0.\eqno(34)$$
When $a = 1$, this is the exponential distribution
with mean 1; when $a = {1\over 2}$, it is the distribution of
${1\over 2}Z↑2$, where $Z$ has the normal distribution (mean
0, variance 1). If $X$ and $Y$ are independent gamma-distributed
random variables, of order $a$ and $b$ respectively, then $X
+ Y$ has the gamma distribution of order $a + b$. Thus, for
example, the sum of $k$ independent exponential deviates with
mean 1 has the gamma distribution of order $k$. If the logarithm
method (32) is being used to generate these exponential deviates,
we need compute only one logarithm: $X ← -\ln(U↓1 \ldotsm U↓k)$,
where $U↓1$, $\ldotss$, $U↓k$ are nonzero uniform deviates. This
technique handles all integer orders $a$; to complete the picture,
a suitable method for $0 < a < 1$ appears in exercise 16.

The simple logarithm method is much too slow when $a$ is large,
since it requires $\lfloor a\rfloor$ uniform deviates. For large
$a$, the following algorithm due to J. H. Ahrens
is reasonably efficient, and easy to write in terms of standard
subroutines.
%folio 165 galley 7 (C) Addison-Wesley 1978	*
\algbegin Algorithm A (Gamma distribution of order $a >\;${\rm 1}).

\algstep A1. [Generate candidate.] Set $Y
← \tan(πU)$, where $U$ is a uniform deviate, and set $X ←
\sqrt{2a - 1}\, Y + a - 1$. (In place of $\tan(πU)$ we
could use a polar method, e.g., $V↓2/V↓1$ in step P4 of Algorithm
P.)

\algstep A2. [Accept?] If $X ≤ 0$, return to A1. Otherwise generate
a new uniform deviate $U$, and return to A1 if $U > (1 + Y↑2)\exp\biglp
(a - 1)\ln\biglp X/(a - 1)\bigrp - \sqrt{2a
- 1}\, Y\bigrp$.\quad\blackslug

\yyskip The average number of times step
A1 is performed is $<1.902$ when $a ≥ 3$. For further discussion,
proof, and a slightly more complex method which is two to three
times faster, see J. H. Ahrens and U. Dieter, {\sl Computing}
{\bf 12} (1974), 223--246.

There is also an attractive approach for large $a$ based on the remarkable
fact that gamma deviates are approximately equal to $aX↑3$, where $X$ is
normally distributed with mean $1-1/(9a)$ and standard deviation $1/\sqrt{9a}$;
see G. Marsaglia, {\sl Computers and Math.\ \bf3} (1977), 321--325.\footnote
*{Change ``$+(3a-1)$'' to ``$-(3a-1)$'' in Step 3 of the algorithm on page 323.}

\yyskip\noindent({\:g2\/}) {\sl The beta distribution} with
positive parameters $a$ and $b$ is defined by
$$F(x) = {\Gamma (a + b)\over \Gamma (a)\Gamma (b)} \int ↑{x}↓{0}
t↑{a-1}(1 - t)↑{b-1}\,dt,\qquad 0 ≤ x ≤ 1.\eqno(35)$$
Let $X↓1$ and $X↓2$ be independent gamma deviates
of order $a$ and $b$, respectively, and set $X ← X↓1/(X↓1 +
X↓2)$. Another method, useful for small $a$ and $b$, is to set
$Y↓1 ← U↑{1/a}↓{1}$ and $Y↓2 ← U↑{1/b}↓{2}$ repeatedly until
$Y↓1 + Y↓2 ≤ 1$; then $X ← Y↓1/(Y↓1 + Y↓2)$. [See M. D. J\"ohnk,
{\sl Metrika \bf 8} (1964), 5--15.] Still another approach,
if $a$ and $b$ are integers (not too large), is to set $X$ to
the $b$th largest of $a + b - 1$ independent uniform deviates
(cf.\ exercise 5--7). See also the direct method described by R. C.
H. Cheng, {\sl CACM \bf21} (1978), 317--322.

\yyskip\noindent({\:g3\/}) {\sl The chi-square distribution} with
$\nu$ degrees of freedom (Eq.\ 3.3.1--22) is obtained by setting
$X ← 2Y$, where $Y$ has the gamma distribution of order $\nu
/2$.

\yyskip\noindent({\:g4\/}) {\sl The F-distribution} (variance-ratio
distribution) with $\nu↓1$ and $\nu↓2$ degrees of freedom is defined
by
$$F(x) = {\nu ↑{\nu ↓1/2}↓1\nu ↑{\nu ↓2/2}↓{2}\Gamma \biglp
(\nu ↓1 + \nu ↓2)/2\bigrp \over\Gamma (\nu ↓1/2)\Gamma (\nu ↓2/2)}
\int ↑{x}↓{0} t↑{\nu↓1/2}(\nu ↓2 + \nu ↓1t)↑{-\nu↓1/2-\nu↓2/2}\,dt,
\eqno(36)$$
where $x ≥ 0$. Let $Y↓1$ and $Y↓2$ be independent,
having the chi-square distribution with $\nu ↓1$ and $\nu ↓2$ degrees
of freedom, respectively; set $X ← Y↓1\nu ↓2/Y↓2\nu ↓1$. Or
set $X ← \nu ↓2Y/\nu ↓1(1 - Y)$, where $Y$ has the beta distribution
with parameters $\nu ↓1/2, \nu ↓2/2$.

\yyskip\noindent({\:g5\/}) {\sl The $t$-distribution} with $\nu$ degrees
of freedom is defined by
$$F(x) = {\Gamma \biglp (\nu + 1)/2\bigrp \over 
\sqrt{π\nu }\,\Gamma (\nu /2)} \int ↑{x}↓{-∞} (1 + t↑2/\nu )↑{-(\nu+1)/2}
\,dt.\eqno(37)$$
Let $Y↓1$ be a normal deviate (mean 0, variance
1) and let $Y↓2$ be independent of $Y↓1$, having the chi-square
distribution with $\nu$ degrees of freedom; set $X ← Y↓1/
\sqrt{Y↓2/\nu }$. [See also A. J. Kinderman, J. F. Monahan, and J. G.
Ramage, {\sl Math.\ Comp.\ \bf31} (1977), 1009--1018.]

\yyskip\noindent({\:g6\/}) {\sl Random point on $n$-dimensional sphere
with radius one.}\xskip Let $X↓1$, $X↓2$, $\ldotss$, $X↓n$ be independent
normal deviates (mean 0, variance 1); the desired point on the
unit sphere is
$$(X↓1/r, X↓2/r, \ldotss , X↓n/r),\qquad\hjust{where }r =
\sqrt{X↓1↑2 + X↓2↑2 + \cdots + X↓n↑2}.\eqno(38)$$
Note that if the $X$'s are calculated using the
polar method, Algorithm P\null, we compute two independent $X$'s
each time, and $X↓1↑2 + X↓2↑2 = -2 \ln S$ (in
the notation of that algorithm); this saves a little of the
time needed to evaluate $r$. The validity of this method comes
from the fact that the distribution function for the point $(X↓1,
\ldotss , X↓n)$ has a density which depends only on the distance
from the origin, so when it is projected onto the unit sphere
it has the uniform distribution. This method was first suggested
by G. W. Brown, in {\sl Modern Mathematics for the Engineer},
First series, ed. by E. F. Beckenbach (New York: McGraw-Hill,
1956), p.\ 302. To get a random point {\sl inside} the $n$-sphere,
R. P. Brent suggests taking a point on the surface and multiplying
it by $U↑{1/n}$.

In three dimensions a significantly simpler method
can be used, since each individual coordinate is uniformly distributed
between $-1$ and 1: Find $V↓1$, $V↓2$, $S$ by steps P1--P3 of Algorithm
P\null; then the desired random point on the surface of a globe is 
$(αV↓1, αV↓2, 2S-1)$, where $α
= 2\sqrt{1 - S}$. [Robert E. Knop, {\sl CACM}
{\bf 13} (1970), 326.]

\subsectionbegin{F. Important integer-valued distributions}
A probability distribution which consists only of integer values
can essentially be handled by the techniques described at the
beginning of this section; but some of these distributions are
so important in practice, they deserve special mention here.

\yyskip\noindent({\:g1\/}) {\sl The geometric distribution.}\xskip If
some event occurs with probability $p$, the number $N$ of independent
trials needed until the event first occurs (or between occurrences
of the event) has the geometric distribution. We have $N = 1$
with probability $p$, $N = 2$ with probability $(1 - p)p$, $\ldotss
$, $N = n$ with probability $(1 - p)↑{n-1}p$. This is essentially
the situation we have already considered in the gap
test of Section 3.3.2; it is also directly related to the number
of times certain loops in the algorithms of this section are
executed, e.g., steps P1--P3 of the polar method.

A convenient way to generate a variable with this distribution
is to set
$$N ← \lceil \ln U\,/\,\ln(1 - p)\rceil .\eqno(39)$$
To check this formula, we observe that $\lceil
\ln U\,/\,\ln(1 - p)\rceil = n$ if and only if $n - 1 < \ln U\,/\,\ln(1
- p) ≤ n$, that is, $(1 - p)↑{n-1} > U ≥ (1 - p)↑n$, and this
happens with probability $p(1 - p)↑{n-1}$ as required. Note
that $\ln U$ can be replaced by $-Y$, where $Y$ has the exponential
distribution with mean 1.

The special case $p = {1\over 2}$ can be handled more easily
on a binary computer, since formula (39) becomes $N ← \lceil
-\log↓2 U\rceil$; that is, $N$ is one more than the number of leading
zero bits in the binary representation of $U$.
%folio 173 galley 8 (C) Addison-Wesley 1978	*
\yyskip\noindent({\:g2\/}) {\sl The binomial distribution $(t, p)$.}\xskip If some
event occurs with probability $p$ and if we carry out $t$ independent
trials, the total number $N$ of occurrences equals $n$ with
probability ${t\choose n}p↑n(1 - p)↑{t-n}$. (See Section 1.2.10.)
In other words if we generate $U↓1$, $\ldotss$, $U↓t$, we want to
count how many of these are $<p$. For small $t$ we can obtain
$N$ in exactly this way.

For large $t$, we can generate a beta variate $X$ with integer
parameters $a$ and $b$ where $a + b - 1 = t$; this effectively
gives us the $b$th largest of $t$ elements, without bothering
to generate the other elements. Now if $X ≥ p$, we set $N ←
N↓1$ where $N↓1$ has the binomial distribution $(a - 1, p/X)$,
since this tells us how many of $a - 1$ random numbers in the
range $[0, X)$ are $<p$; and if $X < p$, we set $N ← a + N↓1$
where $N↓1$ has the binomial distribution $\biglp b - 1, (p
- X)/(1 - X)\bigrp $, since $N↓1$ tells us how many of $b -
1$ random numbers in the range $[X, 1)$ are $<p$. By choosing
$a = 1 + \lfloor t/2\rfloor $, the parameter $t$ will be reduced
to a reasonable size after about $\lg t$ reductions of this kind.
(This approach is due to J. H. Ahrens, who has also suggested
an alternative for medium-sized $t$; see exercise 27.)

\yyskip\noindent({\:g3\/}) {\sl The Poisson distribution} with mean
$\mu$. \xskip This distribution is related to the exponential
distribution as the binomial distribution is related to the
geometric: it represents the number of occurrences, per unit
time, of an event that can occur at any instant of time. For
example, the number of alpha particles emitted by a radioactive
substance in a single second has a Poisson distribution.

According to this principle, we can produce a Poisson deviate
$N$ by generating independent exponential deviates $X↓1$, $X↓2$,
$\ldots$ with mean $1/\mu $, stopping as soon as $X↓1 + \cdots
+ X↓m ≥ 1$; then $N ← m - 1$. The probability that $X↓1 + \cdots +
X↓m ≥ 1$ is the probability that a gamma deviate of order $m$
is $≥\mu $, namely $\int ↑{∞}↓{\mu } t↑{m-1}e↑{-t}\,dt/(m - 1)!$;
hence the probability that $N = n$ is
$${1\over n!} \int ↑{∞}↓{\mu } t↑ne↑{-t}\,dt \;-\; {1\over (n -
1)!} \int ↑{∞}↓{\mu } t↑{n-1}e↑{-t}\,dt = e↑{-\mu} {\mu ↑n\over
n!} ,\qquad n ≥ 0.\eqno(40)$$
If we generate exponential deviates by the logarithm
method, the above recipe tells us to stop when $-(\ln U↓1 + \cdots
+ \ln U↓m)/\mu ≥ 1$. Simplifying this expression, we see that
the desired Poisson deviate can be obtained by calculating $e↑{-\mu}$,
converting it to a fixed-point representation, then generating
one or more uniform deviates $U↓1$, $U↓2$, $\ldots$ until the product satisfies
$U↓1 \ldotsm U↓m ≤ e↑{-\mu}$, finally setting $N ← m - 1$.
On the average this requires the generation of $\mu + 1$ uniform
deviates, so it is a very useful approach when $\mu$ is not
too large.

When $\mu$ is large, we can obtain a method of order $\log \mu$
by using the fact that we know how to handle the gamma and binomial
distributions for large orders: First generate $X$ with the
gamma distribution of order $m = \lfloor α\mu \rfloor $, where
$α$ is a suitable constant. (Since $X$ is equivalent to $-\ln(U↓1
\ldotsm U↓m)$, we are essentially bypassing $m$ steps of the
previous method.) If $X < \mu $, set $N ← m + N↓1$, where $N↓1$
is a Poisson deviate of mean $\mu - X$; and if $X ≥ \mu $, set
$N ← N↓1$, where $N↓1$ has the binomial distribution $(m - 1,
\mu /X)$. This method is due to J. H. Ahrens and U. Dieter,
whose experiments suggest that ${7\over 8}$ is a good choice
for $α$.

The validity of the above reduction when $X ≥ \mu$ is a consequence
of the following important principle: ``Let $X↓1$, $\ldotss$, $X↓m$
be independent exponential deviates with the same mean; let
$S↓j = X↓1 + \cdots + X↓j$ and let $V↓j = S↓j/S↓m$ for $1 ≤
j ≤ m$. Then the distribution of $V↓1$, $V↓2$, $\ldotss$, $V↓{m-1}$
is the same as the distribution of $m - 1$ independent uniform
deviates sorted into increasing order.'' To establish this principle
formally, we compute the probability that $V↓1 ≤ v↓1$, $\ldotss
$, $V↓{m-1} ≤ v↓{m-1}$, given the value of $S↓m = s$, for arbitrary
values $0 ≤ v↓1 ≤ \cdots ≤v↓{m-1} ≤ 1$:
$$\twoline{\textstyle\hskip-10pt{\int ↑{v↓1s}↓{0}\hskip-1.5pt \mu e↑{-t↓1/\mu}\,dt↓1
\int ↑{v↓2s-t↓1}↓{0} \hskip-1.5pt \mu e↑{-t↓2/\mu}\,dt↓2 \,\ldots
\int ↑{v↓{m-1}s-t↓1 - \cdots - t↓{m-2}}↓0\hskip-1.5pt\mu e↑{-t↓{m-1}/\mu}
\,dt↓{m-1} \cdot \mu e↑{-(s-t↓1-\cdots-t↓{m-1})/\mu}\over
\int ↑{s}↓{0} \mu e↑{-t↓1/\mu}\,dt↓1
\int ↑{s-t↓1}↓{0} \mu e↑{-t↓2/\mu}\,dt↓2\,\ldots
\int ↑{s-t↓1 - \cdots - t↓{m-2}}↓0\mu e↑{-t↓{m-1}/\mu}
\,dt↓{m-1} \cdot \mu e↑{-(s-t↓1-\cdots-t↓{m-1})/\mu}}}{3pt}{
= {\int ↑{v↓1}↓0\, du↓1\, \int ↑{v↓2}↓{u↓1}\,
du↓2\ldotsm\int ↑{v↓{m-1}}↓{u↓{m-2}} du↓{m-1}\over
\int ↑{1}↓0\, du↓1\, \int ↑{1}↓{u↓1}\,
du↓2\ldotsm\int ↑{1}↓{u↓{m-2}} du↓{m-1}},}$$
by making the substitution $t↓1 = su↓1$, $t↓1 +
t↓2 = su↓2$, $\ldotss$, $t↓1 + \cdots + t↓{m-1} = su↓{m-1}$. The
latter ratio is the corresponding probability that uniform deviates
$U↓1$, $\ldotss$, $U↓{m-1}$ satisfy $U↓1 ≤ v↓1$, $\ldotss$, $U↓{m-1}
≤ v↓{m-1}$, given that $U↓1 ≤ \cdots ≤ U↓{m-1}$.

A more efficient but somewhat more complicated technique for binomial
and Poisson deviates is sketched in exercise 22.

\subsectionbegin{G. For further reading} The forthcoming book
{\sl Non-Uniform Random Numbers} by J. H. Ahrens and U. Dieter
discusses many more algorithms for the generation
of random variables with nonuniform distributions, together
with a careful consideration of the efficiency of each technique
on typical computers.

From a theoretical point of view it is interesting to consider {\sl optimal}
methods for generating random variables with a given distribution, in
the sense that the method produces the desired result from the minimum
possible number of random bits. For the beginnings of a theory dealing with
such questions, see D. E. Knuth and A. C. Yao, {\sl Algorithms and Complexity},
ed. by J. F. Traub (New York: Academic Press, 1976), 357--428.

Exercise 16 is recommended as a review of many of the techniques
in this section.
%folio 176 galley 9 (C) Addison-Wesley 1978	*
\exbegin{EXERCISES}

\exno 1. [10] If $α$
and $β$ are real numbers with $α < β$, how would you generate
a random real number uniformly distributed between $α$ and $β$?

\exno 2. [M16] Assuming that $mU$
is a random integer between 0 and $m - 1$, what is the {\sl
exact} probability that $\lfloor kU\rfloor = r$, if $0 ≤ r <
k$? Compare this with the desired probability $1/k$.

\trexno 3. [14] Discuss treating
$U$ as an integer and {\sl dividing} by $k$ to get a random
integer between 0 and $k - 1$, instead of multiplying as suggested
in the text. Thus (1) would be changed to
$$\vcenter{\mixtwo{ENTA⊗0\cr
LDX⊗U\cr
DIV⊗K\cr}}$$
with the result appearing in register X. Is this a good method?

\exno 4. [M20] Prove the two relations in (8).

\trexno 5. [21] Suggest an efficient
method to compute a random variable with the distribution $px
+ qx↑2 + rx↑3$, where $p ≥ 0$, $q ≥ 0$, $r ≥ 0$, and $p + q + r
= 1$.

\exno 6. [HM21] A quantity $X$ is computed
by the following method:
$$\eqalign{\hjust{``{\sl Step }1. }⊗\hjust{Generate two independent uniform
deviates $U, V$.}\cr\noalign{\vskip 3pt}
\hjust{\sl Step \rm 2. }⊗\hjust{If $U↑2 + V↑2 ≥ 1$, return
to step 1; otherwise set $X ← U$.''}\cr}$$
What is the distribution function of $X$? How
many times will step 1 be performed? (Give the mean and standard
deviation.)

\trexno 7. [20] (A. J. Walker.)\xskip Suppose we have a bunch of cubes of
$k$ different collors, say $n↓j$ cubes of color $C↓j$ for $1≤j≤k$, and we
also have $k$ boxes $\{B↓1,\ldotss,B↓k\}$ each of which can hold exactly
$n$ cubes. Furthermore $n↓1+\cdots+n↓k=kn$, so the cubes will just fit in
the boxes.
Prove (constructively) that there is always a way to put the cubes into the
boxes so that each box contains at most two different colors of cubes; in fact,
there is a way to do it so that, whenever box $B↓j$ contains two colors, one
of those colors is $C↓j$.  Show how to use this principle to compute the $P$
and $Y$ tables required in (3), given a probability distribution $(p↓1,\ldotss,
p↓k)$.

\exno 8. [M15] Show that operation (3) could be changed to
$$\hjust{if}\quad U<P↓K\quad\hjust{then}\quad X←x↓{K+1}\quad\hjust{otherwise}
\quad X←Y↓K$$ (i.e., using the original value of $U$ instead of $V$) if this
were more convenient, by suitably modifying $P↓0$, $P↓1$, $\ldotss$, $P↓{k-1}$.

\exno 9. [HM10] Why is the curve
$f(x)$ of Fig.\ 9 concave downward for $x < 1$, concave upward
for $x > 1$?

\trexno 10. [HM24] Explain how to calculate auxiliary constants
$P↓j$, $Q↓j$, $Y↓j$, $Z↓j$, $S↓j$, $D↓j$, $E↓j$ so that Algorithm M
delivers answers with the correct distribution.

\trexno 11. [HM27] Prove that steps M7--M8 of Algorithm M generate
a random variable with the appropriate tail of the normal distribution;
i.e., the probability that $X ≤ x$ should be
$$\int ↑{x}↓{3} e↑{-t↑2/2}\,dt\,\vcenter{\hjust{\:@\char'36}} \int ↑{∞}↓{3}
e↑{-t↑2/2}\,dt,\qquad x ≥ 3.$$
[{\sl Hint:} Show that it is a special case of the rejection method, with
$g(t)=Cte↑{-t↑2/2}$ for some $C$.]

\exno 12. [HM23] (R. P. Brent.)\xskip Prove that the numbers $a↓j$
defined in (23) satisfy the relation $a↓j↑2/2 - a↑{2}↓{j-1}/2 < \ln
2$ for all $j ≥ 1$.\xskip [{\sl Hint:} If $f(x) = e↑{x↑2/2}
\int ↑{∞}↓{x} e↑{-t↑2/2}\,dt$, show that $f(x) < f(y)$
for $0 ≤ x < y$.]

\exno 13. [HM25] Given a set of $n$ independent normal deviates,
$X↓1$, $X↓2$, $\ldotss$, $X↓n$, with mean 0 and variance 1, show how
to find constants $b↓j$ and $a↓{ij}$, $1 ≤ j ≤ i ≤ n$, so that if
$$\twoline{Y↓1 = b↓1 + a↓{11}X↓1,\qquad Y↓2 = b↓2 + a↓{21}X↓1
+ a↓{22}X↓2,\qquad \ldotss,}{2pt}{
Y↓n = b↓n + a↓{n1}X↓1 + a↓{n2}X↓2 + \cdots +a↓{nn}X↓n,}$$
then $Y↓1$, $Y↓2$, $\ldotss$, $Y↓n$ are dependent
normally distributed variables, $Y↓j$ has mean $\mu ↓j$, and
the $Y$'s have a given covariance matrix $(c↓{ij})$. $\biglp$The covariance,
$c↓{ij}$, of $Y↓i$ and $Y↓j$ is defined to be the average value
of $(Y↓i - \mu ↓i)(Y↓j - \mu ↓j)$. In particular, $c↓{jj}$ is
the variance of $Y↓j$, the square of its standard deviation.
Not all matrices $(c↓{ij})$ can be covariance matrices, and
your construction is, of course, only supposed to work whenever
a solution to the given conditions is possible.$\bigrp$

\exno 14. [M21] If $X$ is a random variable with continuous
distribution $F(x)$, and if $c$ is a constant, what is the distribution
of $cX$?

\exno 15. [HM21] If $X↓1$ and $X↓2$ are independent random variables
with the respective distributions $F↓1(x)$ and $F↓2(x)$, and with
densities $f↓1(x) = {F↓1}↑\prime (x)$, $f↓2(x) = {F
↓2}↑\prime (x)$, what are the distribution and density functions
of the quantity $X↓1 + X↓2$?

\trexno 16. [HM22] (J. H. Ahrens.)\xskip Develop an algorithm for gamma
deviates of order $a$ when $0 < a ≤ 1$, using the rejection
method with $cg(t) = t↑{a-1}/\Gamma (a)$ for $0 < t < 1$, $e↑{-t}/\Gamma
(a)$ for $t ≥ 1$.

\trexno 17. [M24] What is the {\sl distribution function} $F(x)$
for the geometric distribution with probability $p$? What is
the {\sl generating function} $G(z)$? What are the mean and standard
deviation of this distribution?

\exno 18. [M24] Suggest a method to compute a random integer
$N$ for which $N$ takes the value $n$ with probability $np↑2(1
- p)↑{n-1}$, $n ≥ 0$. (The case of particular interest is when
$p$ is rather small.)

\exno 19. [22] The {\sl negative binomial distribution} $(t, p)$
has integer values $N = n$ with probability ${t - 1 + n\choose
n}p↑t(1 - p)↑n$. (Unlike the ordinary binomial distribution,
$t$ need not be an integer, since this quantity is nonnegative
for all $n$ whenever $t > 0$.) Generalizing exercise 18, explain
how to generate integers $N$ with this distribution when $t$
is a small positive integer. What method would you suggest if
$t = p = {1\over 2}$?

\exno 20. [M20] Let $A$ be the area of the shaded region in Fig.\ 13, and
let $R$ be the area of the enclosing rectangle.  Let $I$ be the area of
the interior region recognized by step R2, and let $E$ be the area between
the exterior region rejected in step R3 and the outer rectangle. Determine
the number of times each step of Algorithm R is performed, for each of its
four variants as in (25), in terms of $A$, $R$, $I$, and $E$.

\exno 21. [HM29] Derive formulas for the quantities $A$, $R$, $I$, and $E$
defined in exercise 20. (For $I$ and especially $E$ you may wish to use an
interactive computer algebra system.) Show that $c=e↑{1/4}$ is the best
constant in step R2 for tests of the form ``$X↑2≤4(1+\ln c)-4cU$.''
%folio 178 galley 10 (C) Addison-Wesley 1978	*
\exno 22. [HM40] Can the exact Poisson distribution for large
$\mu$ be obtained by generating an appropriate normal deviate,
converting it to an integer in some convenient way, and applying
a (possibly complicated) correction a small percent of the time?

\exno 23. [HM23] (J. von Neumann.)\xskip Are the following two ways
to generate a random quantity $X$ equivalent (i.e., does the
quantity $X$ have the same distribution)?
$$\eqalign{\noalign{\vskip-4pt}
\hjust{\sl Method \:g1\/\rm: }⊗\hjust{Set $X ← \sin\biglp
(π/2)U\bigrp $, where $U$ is uniform.}\cr\noalign{\vskip 3pt}
\hjust{\sl Method \:g2\/\rm: }⊗\hjust{Generate two uniform deviates,
$U$, $V$, and if $U↑2+V↑2≥1$, repeat}\cr
⊗\hjust{until $U↑2
+ V↑2 < 1$. Then set $X ← |U↑2 - V↑2|/(U↑2 + V↑2)$.}\cr
\noalign{\vskip-4pt}}$$

\exno 24.  [HM40] (S. Ulam, J. von Neumann.)\xskip
Let $V↓0$ be a randomly selected real
number between 0 and 1, and define the sequence $\langle V↓n\rangle$
by the rule $V↓{n+1} = 4V↓n(1 - V↓n)$. If this computation
is done with perfect accuracy, the result should be a sequence
with the distribution $\sin↑2 πU$, where $U$ is uniform, i.e.,
with distribution function
$F(x) = \int ↑{x}↓{0} dx/\sqrt{2πx(1 - x)}$.
For if we write $V↓n = \sin↑2 πU↓n$, we find that
$U↓{n+1} = (2U↓n)\mod 1$; and by the fact that almost all real
numbers have a random binary expansion (see Section 3.5), this
sequence $U↓n$ is equidistributed. But if the computation of
$V↓n$ is done with only finite accuracy, the above argument
breaks down because we soon are dealing with noise from the
roundoff error.\xskip [{\sl Reference:} von Neumann's {\sl Collected
Works \bf 5}, 768--770.]

Analyze the sequence $\langle V↓n\rangle$ defined above when
only finite accuracy is present, both empirically (for various
different choices of $V↓0$) and theoretically. Does the sequence
have a distribution resembling the expected distribution?

\exno 25. [M25] Let $X↓1$, $X↓2$, $\ldotss$, $X↓5$ be binary words
each of whose bits is independently 0 or 1 with probability
${1\over 2}$. What is the probability that a given bit position
of $X↓1 ∨ \biglp X↓2 ∧ \biglp X↓3 ∨ (X↓4 ∧ X↓5)\bigrp\bigrp$
contains a 1? Generalize.

\exno 26.  [M18] Let $N↓1$
and $N↓2$ be independent Poisson deviates with respective means
$\mu ↓1$ and $\mu ↓2$, where $\mu ↓1 > \mu ↓2 ≥ 0$. Prove or
disprove:\xskip (a) $N↓1 + N↓2$ has the Poisson distribution with
mean $\mu ↓1 + \mu ↓2$.\xskip (b) $N↓1 - N↓2$ has the Poisson distribution
with mean $\mu ↓1 - \mu ↓2$.

\exno 27. [22] (J. H. Ahrens.)\xskip On most binary computers there
is an efficient way to count the number of 1's in a binary word
(cf.\ Section 7.1). Hence there is a nice way to obtain the binomial
distribution $(t, p)$ when $p = {1\over 2}$, simply by generating
$t$ random bits and counting the number of 1's.

Design an algorithm which produces the binomial distribution
$(t, p)$ for {\sl arbitrary} $p$, using only a subroutine for
the special case $p = {1\over 2}$ as a source of random data.\xskip
[{\sl Hint:} Simulate a process which first looks at the most
significant bits of $t$ uniform deviates, then at the second
bit of those deviates whose leading bit is not sufficient to
determine whether or not their value is $<p$, etc.]

\exno 28. [HM35] (R. P. Brent.)\xskip Develop a method to generate
a random point on the surface of the ellipsoid defined by $\sum
a↓kx↑{2}↓{k} = 1$, where $a↓1 ≥ \cdots ≥ a↓n > 0$.
  
%folio 182 galley 11 (C) Addison-Wesley 1978	*
\runningrighthead{RANDOM SAMPLING AND SHUFFLING}
\section{3.4.2}
\sectionskip
\sectionbegin{3.4.2. Random Sampling and Shuffling}
Many data processing applications call for an unbiased choice of $n$ records
at random from a file containing $N$ records. This problem arises, for example,
in quality control or other statistical calculations where sampling is needed.
Usually $N$ is very large, so that it is impossible to contain all the data
in memory at once; and the individual records themselves are often very large,
so that we can't even hold $n$ records in memory. Therefore we seek an efficient
procedure for selecting $n$ records by deciding whether or not to accept or to
reject each record as it comes along, writing the accepted records onto an
output file.

Several methods have been devised for this problem. The most obvious approach is
to select each record with probability $n/N$; this may sometimes be appropriate,
but it gives only an {\sl average} of $n$ records in the sample. The standard
deviation is $\sqrt{n(1-n/N)}$, and it is possible that the sample
will be either too large for the desired application, or too small to give
the necessary results.

A simple modification of the ``obvious'' procedure gives what we want:
The $(t + 1)$st record should be selected
with probability $(n - m)/(N - t)$, if $m$ items have already
been selected. This is the appropriate probability, since of
all the possible ways to choose $n$ things from $N$ such that
$m$ values occur in the first $t$, exactly
$${N - t - 1\choose n - m - 1}\left/{N - t\choose
n - m}\right. = {n - m\over N - t}\eqno(1)$$
of these select the $(t + 1)$st element.

The idea developed in the preceding paragraph leads immediately
to the following algorithm:

\algbegin Algorithm S (Selection sampling technique). 
To select $n$ records at random from a set of $N$, where $0
< n ≤ N$.

\algstep S1. [Initialize.] Set $t ← 0$, $m
← 0$. (During this algorithm, $m$ represents the number of records
selected so far, and $t$ is the total number of input records
we have dealt with.)

\algstep S2. [Generate $U$.] Generate a random number $U$,
uniformly distributed between zero and one.

\algstep S3. [Test.] If $(N - t)U ≥ n - m$, go to step S5.

\algstep S4. [Select.] Select the next record for the sample,
and increase $m$ and $t$ by 1. If $m < n$, go to step S2; otherwise
the sample is complete and the algorithm terminates.

\algstep S5. [Skip.] Skip the next record (do not include it
in the sample), increase $t$ by 1, and go to step S2.\quad\blackslug

\yyskip This algorithm may appear to be unreliable at first glance and,
in fact, to be incorrect;
but a careful analysis (see the exercises below) shows that
it is completely trustworthy. It is not difficult to verify that

\yskip\textindent{a)}\hang
At most $N$ records are input (we never run off the end
of the file before choosing $n$ items).

\yskip\textindent{b)}\hang
The sample is completely unbiased; in particular,
the probability that any given element is selected, e.g., the
last element of the file, is $n/N$.

\yskip Statement (b) is true in spite of the fact
that we are {\sl not} selecting the $(t + 1)$st item with probability
$n/N$, we select it with the probability in Eq.\ (1)! This has
caused some confusion in the published literature. Can the reader
explain this seeming contradiction?

({\sl Note:} When using Algorithm S\null, one should be careful
to use a different source of random numbers $U$ each time the
program is run, to avoid connections between the samples obtained
on different days. This can be done, for example, by choosing
a different value of $X↓0$ for the linear congruential method
each time; $X↓0$ could be set to the current date, or to the
last $X$ value generated on the previous run of the program.)

We will usually not have to pass over all $N$ records; in fact,
since (b) above says that the last record is selected with probability
$n/N$, we will terminate the algorithm {\sl before} considering
the last record exactly $(1 - n/N)$ of the time. The average
number of records considered when $n = 2$ is about ${2\over
3}N$, and the general formulas are given in exercises 5 and
6.

Algorithm S and a number of other sampling techniques are discussed
in a paper by C. T. Fan, Mervin E. Muller, and Ivan Rezucha,
{\sl J. Amer.\ Stat.\ Assoc.\ \bf 57}
(1962), 387--402. The method was independently discovered by
T. G. Jones, {\sl CACM \bf 5} (1962), 343.

\yskip A problem arises if we don't know the value of $N$ in advance,
since the precise value of $N$ is crucial in Algorithm S\null. Suppose
we want to select $n$ items at random from a file, without knowing
exactly how many are present in that file. We could first go
through and count the records, then take a second pass to select
them; but it is generally better to sample $m ≥ n$ of the original
items on the first pass, where $m$ is much less than $N$, so
that only $m$ items must be considered on the second pass. The
trick, of course, is to do this in such a way that the final
result is a truly random sample of the original file.

Since we don't know when the input is going to end, we must
keep track of a random sample of the input records seen so far,
thus always being prepared for the end. As we read the input
we will construct a ``reservoir'' which contains only those
$m$ records which have appeared among the previous samples.
The first $n$ records always go into the reservoir. When the
$(t + 1)$st record is being input, for $t ≥ n$, we will have
in memory a table of $n$ indices pointing to those records in
the reservoir which belong to the random sample we have chosen
from the first $t$ records. The problem is to maintain this
situation with $t$ increased by one, namely to find a new random
sample from among the $t + 1$ records now known to be present.
It is not hard to see that we should include the new record
in the new sample with probability $n/(t + 1)$, and in such
a case it should replace a random element of the previous sample.

Thus, the following procedure does the job:

\algbegin Algorithm R (Reservoir sampling). To select
$n$ records at random from a file of unknown size $≥n$, given
$n > 0$. An auxiliary file called the ``reservoir'' contains
all records which are candidates for the final sample. The algorithm
uses a table of distinct indices $I[j]$ for $1 ≤ j ≤ n$, each
of which points to one of the records in the reservoir.

\algstep R1. [Initialize.] Input the first
$n$ records and copy them to the reservoir. Set $I[j] ←
j$ for $1 ≤ j ≤ n$, and set $t ← m ← n$. (If the file being sampled
has fewer than $n$ records, it will of course be necessary to
abort the algorithm and report failure. During this algorithm,
$\{I[1], \ldotss , I[n]\}$ point to the records in the current
sample, $m$ is the size of the reservoir, and $t$ is the number
of input records dealt with so far.)

\algstep R2. [End of file?] If there are no more records to
be input, go to step R6.

\algstep R3. [Generate and test.] Increase $t$ by
1, then generate a random integer $M$ between 1 and $t$ (inclusive).
If $M > n$, go to R5.

\algstep R4. [Add to reservoir.] Copy the next record of the
input file to the reservoir, increase $m$ by 1, and set
$I[M] ← m$. (The record previously pointed to by $I[M]$ is
being replaced in the sample by the new record.) Go back to
R2.

\algstep R5. [Skip.] Skip over the next record of the input
file (do not include it in the reservoir), and return to step
R2.

\algstep R6. [Second pass.] Sort the $I$ table entries so that
$I[1] < \cdots < I[n]$; then go through the reservoir, copying
the records with these indices into the output file which is
to hold the final sample.\quad\blackslug

\yyskip Algorithm R is due to Alan
G. Waterman. The reader may wish to work out the example of
its operation which appears in exercise 9.

If the records are sufficiently short, it is of course unnecessary
to have a reservoir at all; we can keep the $n$ records of the
current sample in memory at all times (see exercise 10).

The natural question to ask about Algorithm R is, ``What is
the expected size of the reservoir?'' Exercise 11 shows that
the average value of $m$ is exactly $n(1 + H↓N - H↓n)$; this
is approximately $n\biglp 1 + \ln(N/n)\bigrp$. So if $N/n =
1000$, the reservoir will contain only about $1\over 125$ as many items
as the original file.
%folio 186 galley 12 (C) Addison-Wesley 1978	*
Note that Algorithms S and R can be used to obtain samples
for several independent categories simultaneously. For example,
if we have a large file of names and addresses of U.S. residents,
we could pick random samples of exactly 10 people from each
of the 50 states without making 50 passes through the file,
and without first sorting the file by state.

\yskip The {\sl sampling problem}, as described here, can be regarded
as the computation of a random {\sl combination}, according
to the conventional definition of combinations of $N$ things
taken $n$ at a time (see Section 1.2.6). Now let us consider
the problem of computing a random {\sl permutation} of $t$ objects;
we will call this the {\sl shuffling problem}, since shuffling a
deck of cards is nothing more than subjecting it to a random
permutation.

A moment's reflection is enough to convince oneself that the
approaches people traditionally use to shuffle cards are miserably
inadequate; there is no hope of obtaining each of the $t!$ permutations
with anywhere near equal probability by such methods. It has
been said that expert bridge players make use of this fact when
deciding whether or not to ``finesse.''

If $t$ is small, we can obtain a random permutation very quickly
by generating a random integer between 1 and $t!$. For example,
when $t = 4$, a random number between 1 and 24 suffices to select
a random permutation from a table of all possibilities. But
for large $t$, it is necessary to be more careful if we want
to claim that each permutation is equally likely, since $t!$
is much larger than the accuracy of individual random numbers.

A suitable shuffling procedure can be obtained by recalling
Algorithm 3.3.2P\null, which gives a simple correspondence between
each of the $t!$ possible permutations and a sequence of numbers
$(c↓1, c↓2, \ldotss , c↓{t-1})$ with $0 ≤ c↓j ≤ j$. It is easy
to compute such a set of numbers at random, and we can use the
correspondence to produce a random permutation.

\algbegin Algorithm P (Shuffling). Let $X↓1$,
$X↓2$, $\ldotss$, $X↓t$ be a set of $t$ numbers to be shuffled.

\algstep P1. [Initialize.] Set $j ← t$.

\algstep P2. [Generate $U$.] Generate a random number $U$,
uniformly distributed between zero and one.

\algstep P3. [Exchange.] Set $k ← \lfloor jU\rfloor + 1$.\xskip (Now
$k$ is a random integer, between 1 and $j$.)\xskip Exchange $X↓k ↔
X↓j$.

\algstep P4. [Decrease $j$.] Decrease $j$ by 1. If $j >
1$, return to step P2.\quad\blackslug

\yyskip This algorithm was first
published by L. E. Moses and R. V. Oakford, in {\sl Tables of
Random Permutations} (Stanford University Press, 1963); and
by R. Durstenfeld, {\sl CACM \bf 7} (1964), 420. It can also be
modified to obtain a random permutation of a random combination (see exercise 14).

For a discussion of random combinatorial objects of other kinds
(e.g., par\-ti\-tions), see Section 7.2 and/or the book {\sl Combinatorial
Algorithms} by Nijenhuis and Wilf (New York: Academic Press, 1975).

\exbegin{EXERCISES}

\exno 1. [M12] Explain Eq.\ (1).

\exno 2. [20] Prove that Algorithm
S never tries to read more than $N$ records of its input file.

\trexno 3. [22] The $(t + 1)$st item
in Algorithm S is selected with probability $(n - m)/(N - t)$,
not $n/N$, yet the text claims that the sample is unbiased---so
each item should be selected with the {\sl same} probability.
How can both of these statements be true?

\exno 4. [M23] Let $p(m, t)$ be
the probability that exactly $m$ items are selected from among
the first $t$ in the selection sampling technique. Show directly
from Algorithm S that
$$p(m, t) = {t\choose m}{N-t\choose n-m}\left/{N\choose n}\right.,\qquad
\hjust{for }0≤t≤N.$$

\exno 5. [M24] What is the average
value of $t$ when Algorithm S terminates? (In other words, how
many of the $N$ records have been passed,
on the average, before the sample is complete?)

\exno 6. [M24] What is the standard
deviation of the value computed in the previous exercise?

\trexno 7. [M25] Prove that any {\sl
given} choice of $n$ records from the set of $N$ is obtained
by Algorithm S with probability $1/{N\choose n}$. Therefore
the sample is completely unbiased.

\exno 8. [M46] Algorithm S computes one
uniform deviate for each input record it handles. Find a more
efficient way to determine the number of input records to skip
before the first is selected, assuming that $N/n$ is rather
large. (We could iterate this process to select the remaining
$n - 1$ records, thus reducing the number of necessary random deviates
from order $N$ to order $n$.)

\exno 9. [12] Let $n = 3$. If Algorithm
R is applied to a file containing 20 records numbered 1 thru
20, and if the random numbers generated in step R3 are respectively
$$4, 1, 6, 7, 5, 3, 5, 11, 11, 3, 7, 9, 3, 11, 4, 5, 4,$$
which records go into the reservoir? Which are
in the final sample?

\exno 10. [15] Modify Algorithm R so that the reservoir is eliminated,
assuming that the $n$ records of the current sample can be held
in memory.

\trexno 11. [M25] Let $p↓m$ be the probability that exactly $m$
elements are put into the reservoir during the first pass of
Algorithm R\null. Determine the generating function $G(z) = \sum
↓mp↓mz↑m$, and find the mean and standard deviation. (Use the
ideas of Section 1.2.10.)

\exno 12. [M26] The gist of Algorithm P is that any permutation
$π$ can be uniquely written as a product of transpositions in
the form $π = (a↓tt) \ldotsm (a↓33)(a↓22)$, where $1 ≤ a↓j ≤
j$ for $t ≥ j > 1$. Prove that there is also a unique representation
of the form $π = (b↓22)(b↓33) \ldotsm (b↓tt)$, where $1 ≤ b↓j
≤ j$ for $1 < j ≤ t$, and design an algorithm which computes the
$b$'s from the $a$'s in $O(t)$ steps.

\exno 13. [M23] (S. W. Golomb.)\xskip One of the most common ways to
shuffle cards is to divide the deck into
two parts as equal as possible, and to ``riffle'' them together.
(See the discussion of card-playing etiquette in Hoyle's rules
of card games; we read, ``A shuffle of this sort should be made
about three times to mix the cards thoroughly.'') Consider a
deck of $2n - 1$ cards $X↓1$, $X↓2$, $\ldotss$, $X↓{2n-1}$; a ``perfect
shuffle'' $s$ divides this deck into $X↓1$, $X↓2$, $\ldotss$, $X↓n$
and $X↓{n+1}$, $\ldotss$, $X↓{2n-1}$, then perfectly interleaves them
to obtain $X↓1$, $X↓{n+1}$, $X↓2$, $X↓{n+2}$, $\ldotss$, $X↓{2n-1}$, $X↓n$.
The ``cut'' operation $c↑j$ changes $X↓1$, $X↓2$, $\ldotss$, $X↓{2n-1}$
into $X↓{j+1}$, $\ldotss$, $X↓{2n-1}$, $X↓1$, $\ldotss$, $X↓j$. Show that
by combining perfect shuffles and cuts, at most $(2n - 1)(2n
- 2)$ different arrangements of the deck are possible, if $n
> 1$.

\trexno 14. [30] (Ole-Johan Dahl.)\xskip If $X↓k=k$ for $1≤k≤t$
at the start of Algorithm P\null, and if we terminate that algorithm
when $j$ reaches the value $t - n$, the sequence $X↓{t-n+1}$,
$\ldotss$, $X↓t$ is a random permutation of a random combination
of $n$ elements. Show how to simulate the effect of this procedure
using only $O(n)$ cells of memory.

\vfill\eject